Neural RST-Style Discourse Parsing Exploiting Agreement Sub-trees as Silver Data
نویسندگان
چکیده
修辞構造解析ではニューラルネットワークなどの識別器を用いた解析器を教師あり学習により学習する.しかし,現存の最大規模のコーパスである RST-DT は 385 文書しかなく,ニューラルネットワークを学習するに十分な量とは言い難い.このような学習データの不足は,クラス数が多く頻度に偏りのある修辞関係ラベルの推定において性能低下の原因となる.そこで,本論文では自動的に修辞構造を付与した疑似正解データセットを利用したニューラル修辞構造解析手法を提案する.疑似正解データセットは複数の解析器により得られた修辞構造木の間で共通する部分木とし,ニューラル修辞構造解析器の事前学習に利用し,人手で作成した正解データを用いて解析器を追加学習する.RST-DT コーパスを用いた実験では,提案手法は OriginalParseval による核性と修辞関係の評価においてそれぞれ micro-F1 で 64.7,54.1 を達成した.
منابع مشابه
Cross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic...
متن کاملTowards Cross-Domain PDTB-Style Discourse Parsing
Discourse relation parsing is an important task with the goal of understanding text beyond the sentence boundaries. With the availability of annotated corpora (Penn Discourse Treebank) statistical discourse parsers were developed. In the literature it was shown that the discourse parsing subtasks of discourse connective detection and relation sense classification do not generalize well across d...
متن کاملBetter Document-level Sentiment Analysis from RST Discourse Parsing
Discourse structure is the hidden link between surface features and document-level properties, such as sentiment polarity. We show that the discourse analyses produced by Rhetorical Structure Theory (RST) parsers can improve document-level sentiment analysis, via composition of local information up the discourse tree. First, we show that reweighting discourse units according to their position i...
متن کاملExploiting Scope for Shallow Discourse Parsing
We present an approach to automatically identifying the arguments of discourse connectives based on data from the Penn Discourse Treebank. Of the two arguments of connectives, called Arg1 and Arg2, we focus on Arg1, which has proven more challenging to identify. Our approach employs a sentence-based representation of arguments, and distinguishes intra-sentential connectives, which take both the...
متن کاملEmpirical comparison of dependency conversions for RST discourse trees
Two heuristic rules that transform Rhetorical Structure Theory discourse trees into discourse dependency trees (DDTs) have recently been proposed (Hirao et al., 2013; Li et al., 2014), but these rules derive significantly different DDTs because their conversion schemes on multinuclear relations are not identical. This paper reveals the difference among DDT formats with respect to the following ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Shizen gengo shori
سال: 2022
ISSN: ['1340-7619', '2185-8314']
DOI: https://doi.org/10.5715/jnlp.29.875